This notebook is intended to provide examples of how Hierarchical (Multilevel / Mixed Effects) Models induce estimate shrinkage via partial-pooling (i.e. effect is similar to regularization) for nested data.

The original dataset is related to website bounce rates and can be found Kaggle notebook here.

The overall goal is to understand how does the average bounce time in a website change with respect to age across counties.

Note: For simplicity we’re ingnoring the fact that each county in the dataset contains different location. If you’d like to consider the effect of those locations you can use the lmer package to fit a cross-nested model structure.

Load Packages

require(tidyverse)
require(dplyr) # dplyr 1.0.0
require(knitr)
require(kableExtra)
require(broom)

# Animation library
require(gganimate)
require(gifski)
require(png)

# Mixed Effects Model libraries
require(lme4)
require(brms)
require(tidybayes)
require(arm)
require(bayesplot)

# Prior libraries
require(extraDistr)

Load Data

We first load the data. I’ve simulated data based on the original data vary the group sizes so that some have few data points and others have more points. Think about this a scenario where some data was corrupted and you can only use a subset of the data points from a county.

It will also help illustrate partial-pooling and how Bayesian methods can help in later sections.

bounce_data <- read_csv("../data/bounce_rates_sim.csv", col_types = cols() ) #cols() suppresses messages

kable(bounce_data) %>% 
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
  row_spec(0, background = "#4CAF50", color="#FFF")  %>% 
  scroll_box(width = "100%", height = "200px") 
bounce_time age county
225.8538 54 kent
217.9062 79 kent
220.6601 71 kent
223.4206 47 kent
213.8878 93 kent
206.4577 64 kent
234.4551 91 kent
203.5949 37 essex
203.6985 28 essex
221.7954 36 essex
212.3589 31 essex
215.5649 51 essex
201.6897 53 essex
205.5136 29 essex
211.1856 27 essex
184.2293 39 essex
203.6672 33 essex
210.3584 62 essex
211.9895 28 essex
208.9214 52 essex
213.3661 55 essex
212.4840 15 essex
201.7498 61 essex
221.7837 38 essex
209.9975 40 essex
205.7030 42 essex
196.0435 32 essex
216.0141 55 essex
198.6938 59 essex
184.5071 30 london
182.5263 24 london
180.2908 13 london
176.6328 26 london
172.5186 48 london
179.8916 1 london
161.4922 31 london
182.0156 36 london
194.6150 21 london
183.0910 53 london
184.4475 37 london
165.2503 12 london
181.7137 40 london
191.3859 25 london
171.8971 33 london
190.2567 36 london
184.4317 27 london
168.7719 14 london
178.9858 40 london
192.2939 15 london
178.4500 18 london
182.4974 16 london
182.3821 14 london
171.8979 14 london
165.9008 18 london
167.4110 59 london
185.5440 28 london
191.1776 28 london
189.3372 21 london
180.2835 32 london
172.2707 16 london
183.3813 31 london
195.2366 21 london
176.8045 6 london
170.7791 4 london
159.2062 21 london
170.3089 38 london
198.6933 48 devon
203.9536 70 devon
204.3526 71 devon
201.6434 88 devon
195.3188 110 devon
204.4924 73 devon
194.3613 72 devon
200.4898 88 devon
197.8712 85 devon
210.8970 80 devon
200.8695 83 devon
191.5367 94 devon
204.6859 56 devon
196.3898 67 devon
197.5210 79 devon
201.0421 67 devon
192.0654 26 devon
199.7605 57 devon
201.8048 86 devon
205.8579 73 devon
199.2470 72 devon
204.1690 104 devon
194.4276 72 devon
191.3841 67 devon
200.4391 80 devon
198.0288 29 devon
203.1436 57 devon
202.7748 71 devon
188.1232 88 devon
202.9197 94 devon
200.3105 68 devon
196.7775 70 devon
199.1221 64 devon
208.8129 76 devon
196.6335 74 devon
203.9025 97 devon
197.6037 73 devon
193.3032 69 devon
204.5152 80 devon
197.2597 84 devon
191.6854 55 devon
197.0951 58 devon
200.3260 106 devon
197.5353 43 devon
200.0882 64 devon
200.6170 67 devon
202.1392 79 devon
198.7077 75 devon
193.9549 44 devon
198.8015 66 devon
194.9967 81 devon
204.3312 77 devon
198.1364 86 devon
207.7474 82 devon
203.1465 64 devon
190.3096 84 devon
202.2230 92 devon
192.6323 69 devon
198.9416 59 devon
193.7122 70 devon
202.2707 79 devon
205.0753 114 devon
201.8092 89 devon
200.6674 76 devon
200.3422 63 devon
199.9825 103 devon
196.7411 62 devon
192.8372 79 devon
200.0539 84 devon
197.2865 101 devon
195.5131 78 devon
193.6063 61 devon
202.5980 87 devon
199.4300 62 devon
203.6946 82 devon
215.4648 113 dorset
200.2166 76 dorset
209.1769 57 dorset
175.9648 45 dorset
207.0599 81 dorset
190.7460 64 dorset
207.5366 76 dorset
212.6425 59 dorset
204.9520 84 dorset
209.0750 61 dorset
208.5712 44 dorset
203.9592 48 dorset
202.8222 52 dorset
211.2343 69 dorset
206.1818 64 dorset
194.8650 80 dorset
191.1600 49 dorset
199.9447 84 dorset
194.6530 29 dorset
191.9744 58 dorset
200.1989 65 dorset
209.0733 66 dorset
206.3179 44 dorset
185.9619 84 dorset
200.3820 82 dorset
193.6978 70 dorset
207.5257 61 dorset
195.5070 45 dorset
201.1868 91 dorset
201.5488 53 dorset
179.5727 54 dorset
218.2631 34 dorset
195.9146 60 dorset
199.6272 63 dorset
215.8019 96 dorset
182.6826 68 dorset
190.9447 70 dorset
195.9754 50 dorset
202.3283 41 dorset
201.4353 52 dorset
207.9298 68 dorset
210.2485 65 dorset
208.0484 47 dorset
195.9026 67 dorset
203.2758 52 dorset
193.9061 82 dorset
204.0604 63 dorset
205.3292 70 dorset
200.8077 62 dorset
195.2892 67 dorset
204.6368 65 dorset
202.5677 36 dorset
204.1111 60 dorset
203.9212 78 dorset
198.0681 47 dorset
199.2772 85 dorset
215.9674 64 dorset
198.2211 54 dorset
198.1251 69 dorset
211.6341 52 dorset
180.5467 40 dorset
200.3906 73 dorset
198.6198 61 dorset
209.8897 101 dorset
205.2606 82 dorset
222.4502 83 dorset
195.4427 60 dorset
196.0813 63 dorset
192.5134 78 dorset
217.9686 62 dorset
188.6774 41 dorset
188.7936 46 dorset
212.7729 51 dorset
202.8989 50 dorset
193.9978 39 dorset
200.7135 62 dorset
223.7702 83 dorset
194.9929 59 dorset
213.5304 40 dorset
203.4709 79 dorset
198.8800 91 dorset
205.5893 94 dorset
208.4568 80 dorset
191.7804 46 dorset
207.1614 74 dorset
183.9436 85 dorset
200.5705 66 dorset
226.9220 41 dorset
213.9588 81 dorset
203.7940 66 dorset
208.4021 68 cumbria
213.9375 71 cumbria
200.1463 71 cumbria
210.3203 66 cumbria
211.1401 73 cumbria
198.6172 68 cumbria
200.4856 73 cumbria
203.4624 46 cumbria
209.8694 68 cumbria
212.5938 13 cumbria
213.3855 69 cumbria
218.4301 46 cumbria
201.3730 49 cumbria
207.0558 51 cumbria
201.8000 74 cumbria
207.5178 73 cumbria
214.5728 70 cumbria
199.4696 67 cumbria
210.4172 64 cumbria
204.2227 70 cumbria
205.6621 75 cumbria
218.3393 52 cumbria
217.8684 75 cumbria
207.4168 87 cumbria
191.7129 35 cumbria
212.3685 35 cumbria
207.7391 53 cumbria
202.3393 31 cumbria
207.6240 60 cumbria
211.1069 49 cumbria
216.9618 60 cumbria
204.8985 41 cumbria
208.0298 70 cumbria
208.5042 63 cumbria
202.6250 52 cumbria
209.0654 71 cumbria
207.5364 66 cumbria
204.8710 66 cumbria
206.6085 45 cumbria
215.9537 62 cumbria
204.0067 52 cumbria
204.1681 70 cumbria
205.6398 99 cumbria
215.0105 61 cumbria
222.2287 60 cumbria
211.5478 93 cumbria
201.4563 35 cumbria
218.9502 46 cumbria
216.5111 129 cumbria
205.6643 32 cumbria
208.7487 48 cumbria
207.6356 75 cumbria
210.5432 79 cumbria
212.7470 61 cumbria
203.2712 49 cumbria
210.0344 83 cumbria
204.5591 42 cumbria
202.7044 44 cumbria
209.7123 33 cumbria
212.2624 74 cumbria
212.7655 65 cumbria
211.8085 63 cumbria
196.2341 46 cumbria
204.3112 48 cumbria
212.9128 48 cumbria
212.4191 49 cumbria
201.5805 67 cumbria
208.7316 47 cumbria
208.1921 77 cumbria
220.7695 47 cumbria
214.4266 53 cumbria
200.8323 69 cumbria
205.0210 49 cumbria
208.2647 50 cumbria
208.0480 43 cumbria
211.5435 69 cumbria
203.6096 41 cumbria
217.7464 92 cumbria
209.0103 57 cumbria
208.1187 49 cumbria
211.3844 58 cumbria
216.8920 65 cumbria
212.9242 72 cumbria
214.4708 54 cumbria
208.0446 83 cumbria
210.9401 60 cumbria
206.6836 69 cumbria
212.0873 68 cumbria
214.7157 50 cumbria
211.5558 70 cumbria
206.0166 36 cumbria
196.2511 79 cumbria
203.4005 58 cumbria
209.8707 48 cumbria
217.3593 58 cumbria
211.5424 75 cumbria
214.0327 70 cumbria
208.8590 73 cumbria
201.8902 72 cumbria
205.3669 75 cumbria
215.8747 86 cumbria
204.0299 62 cumbria
205.2632 53 cumbria
210.2801 39 cumbria
221.1225 72 cumbria
204.9159 48 cumbria
215.3467 75 cumbria
206.4351 63 cumbria
204.9950 76 cumbria
209.5784 48 cumbria
210.9453 57 cumbria
206.6750 46 cumbria
177.6245 30 norfolk
182.1104 60 norfolk
175.4903 24 norfolk
177.1250 3 norfolk
185.6625 43 norfolk
182.3939 34 norfolk
186.2690 43 norfolk
156.0129 25 norfolk
182.7219 33 norfolk
170.3454 16 norfolk
177.1850 23 norfolk
189.1116 17 norfolk
177.1237 31 norfolk
184.0938 26 norfolk
178.4260 10 norfolk
174.4067 10 norfolk
183.0203 -1 norfolk
178.3646 48 norfolk
170.4670 33 norfolk
178.2488 37 norfolk
174.0448 51 norfolk
166.9864 48 norfolk
185.9389 39 norfolk
183.6586 13 norfolk
167.9151 33 norfolk
181.7819 28 norfolk
177.6393 19 norfolk
190.4892 42 norfolk
189.3262 24 norfolk
175.1466 33 norfolk
172.5747 41 norfolk
179.4212 36 norfolk
177.7364 15 norfolk
195.6366 60 norfolk
188.7748 22 norfolk
173.7379 40 norfolk
187.2555 34 norfolk
186.4125 39 norfolk
174.1933 40 norfolk
172.5385 34 norfolk
179.8547 23 norfolk
184.6571 28 norfolk
177.3976 39 norfolk
175.2949 45 norfolk
181.5672 33 norfolk
179.0471 51 norfolk
176.1024 20 norfolk
198.8694 19 norfolk
187.0505 26 norfolk
178.3498 23 norfolk
188.5443 51 norfolk
184.8811 25 norfolk
176.2062 23 norfolk
174.8356 43 norfolk
182.0219 27 norfolk
175.9168 9 norfolk
172.4202 61 norfolk
189.6861 46 norfolk
171.9307 23 norfolk
185.9969 17 norfolk
181.1680 10 norfolk
187.3300 14 norfolk
184.4892 31 norfolk
185.3808 26 norfolk
175.1346 53 norfolk
174.8861 45 norfolk
172.2547 28 norfolk
173.2351 38 norfolk
181.3234 50 norfolk
167.4591 18 norfolk
198.7599 31 norfolk
180.9043 36 norfolk
179.2687 48 norfolk
191.1251 34 norfolk
183.8592 49 norfolk
186.4512 36 norfolk
172.4647 45 norfolk
185.5850 16 norfolk
190.6732 47 norfolk
181.4938 18 norfolk
175.3785 41 norfolk
186.9373 45 norfolk
176.1613 34 norfolk
168.1130 25 norfolk
176.6706 44 norfolk
183.8519 20 norfolk
199.1827 57 norfolk
177.7475 50 norfolk
183.4855 25 norfolk
193.8418 28 norfolk
161.0385 29 norfolk
157.2738 23 norfolk
177.4387 16 norfolk
170.8876 42 norfolk
187.2842 25 norfolk
173.5743 27 norfolk
184.4781 27 norfolk
187.9249 27 norfolk
180.6428 -1 norfolk
173.9960 40 norfolk
178.8904 47 norfolk
178.7011 19 norfolk
183.4040 34 norfolk
181.4545 45 norfolk
194.7127 30 norfolk
175.5028 64 norfolk
177.7311 37 norfolk
179.3548 1 norfolk
191.8206 28 norfolk
179.2707 38 norfolk
185.6427 47 norfolk
190.0737 25 norfolk
179.9949 30 norfolk
175.1075 20 norfolk
174.1200 45 norfolk
189.9190 20 norfolk
174.6957 12 norfolk
179.9953 21 norfolk
173.8736 20 norfolk
173.5177 13 norfolk
221.9283 38 cheshire
207.9079 45 cheshire
212.7421 42 cheshire
234.4364 26 cheshire
211.3112 6 cheshire
215.6437 31 cheshire
215.7583 30 cheshire
225.9816 13 cheshire
202.2175 47 cheshire
216.4464 54 cheshire
189.2030 12 cheshire
213.7356 27 cheshire
215.4297 28 cheshire
205.6262 41 cheshire
209.7567 15 cheshire
218.6768 53 cheshire
207.1210 40 cheshire
197.7208 52 cheshire
205.3591 15 cheshire
203.8642 30 cheshire
220.2534 47 cheshire
215.8066 63 cheshire
204.6500 33 cheshire
211.3456 23 cheshire
209.1033 46 cheshire
204.3843 57 cheshire
214.7387 53 cheshire
225.3257 33 cheshire
209.5644 42 cheshire
212.9073 34 cheshire
203.4897 20 cheshire
218.3658 45 cheshire
209.7521 36 cheshire
199.4428 44 cheshire
197.7247 36 cheshire
201.4917 48 cheshire
216.7140 21 cheshire
210.2920 48 cheshire
227.7717 26 cheshire
199.7231 53 cheshire
212.3758 34 cheshire
218.3416 39 cheshire
216.1571 50 cheshire
203.6633 36 cheshire
220.4490 59 cheshire
213.7362 49 cheshire
192.0860 25 cheshire
219.5891 18 cheshire
215.3220 36 cheshire
217.1592 55 cheshire
229.1985 58 cheshire
197.4211 28 cheshire
194.6101 59 cheshire
215.6196 36 cheshire
210.6817 55 cheshire
198.6076 20 cheshire
217.9418 69 cheshire
212.8855 32 cheshire
211.8936 21 cheshire
224.2742 69 cheshire
212.0488 33 cheshire
207.6988 46 cheshire
210.7533 41 cheshire
199.7975 26 cheshire
199.2004 50 cheshire
213.7474 31 cheshire
232.6517 39 cheshire
214.5394 49 cheshire
227.1374 43 cheshire
223.6423 16 cheshire
220.2531 40 cheshire
204.0903 35 cheshire
216.6103 25 cheshire
204.3529 24 cheshire
202.9545 18 cheshire
220.6908 59 cheshire
197.4419 21 cheshire
219.2022 54 cheshire
200.6058 54 cheshire
203.2914 34 cheshire
221.4731 32 cheshire
191.6479 62 cheshire
214.5692 62 cheshire
217.1172 44 cheshire
219.7899 27 cheshire
209.0349 9 cheshire
194.6766 20 cheshire
212.7522 26 cheshire
207.9540 34 cheshire
220.2253 51 cheshire
202.2456 41 cheshire
210.2562 33 cheshire
210.7081 51 cheshire
213.7315 61 cheshire
213.1203 69 cheshire
200.2013 17 cheshire
219.4747 32 cheshire
196.3528 43 cheshire
218.0399 45 cheshire
198.7352 22 cheshire
226.9599 44 cheshire
199.0811 42 cheshire
217.5453 23 cheshire
215.1293 29 cheshire
215.4584 66 cheshire
211.2915 62 cheshire
210.7704 66 cheshire
218.8084 30 cheshire
198.0270 33 cheshire
220.4636 23 cheshire
213.6203 17 cheshire
211.6374 12 cheshire
201.6045 38 cheshire
204.0669 60 cheshire
198.6537 22 cheshire
205.1926 37 cheshire
208.2451 1 cheshire
202.8818 38 cheshire
217.4679 52 cheshire
205.9116 55 cheshire
209.3939 43 cheshire
212.2201 39 cheshire
220.3121 57 cheshire
209.2730 43 cheshire
222.7670 47 cheshire
213.5815 33 cheshire
225.5685 43 cheshire
212.8360 51 cheshire
218.4750 40 cheshire
201.4827 52 cheshire
220.6145 71 cheshire
212.0379 12 cheshire
205.4092 53 cheshire
233.1305 53 cheshire
192.8963 30 cheshire
203.7808 31 cheshire
207.6008 11 cheshire
209.9624 42 cheshire
210.4145 15 cheshire
219.7713 21 cheshire
212.7669 31 cheshire
210.4563 48 cheshire
196.7473 71 cheshire
221.8812 26 cheshire
222.3395 23 cheshire
232.8309 58 cheshire
223.0557 31 cheshire
207.5521 30 cheshire
210.3355 34 cheshire
216.5485 39 cheshire

We notice there aren’t any missing data, but we can center-scale the age variable. This is helpful when interpreting the model coefficients in our case (especially when dependent and independent have different scales, e.g. a population varible).

# Check distribution of data
summary(bounce_data)
##   bounce_time         age            county         
##  Min.   :156.0   Min.   : -1.00   Length:613        
##  1st Qu.:190.3   1st Qu.: 32.00   Class :character  
##  Median :202.6   Median : 48.00   Mode  :character  
##  Mean   :200.0   Mean   : 49.01                     
##  3rd Qu.:210.8   3rd Qu.: 66.00                     
##  Max.   :234.5   Max.   :129.00

Standardize (center-scale)

In this case, we primarily to avoid having the variance of our linear model estimates to be in different scales (i.e it provides some stability to our estimates). Next, if we didn’t standardize (center-scale) the data then the intercept would be interpreted as the “expected bounce rate when a person has 0 years”, which doesn’t make much sense. To fix this we standardize the age variable so we can interpret the effect of the variable as deviations from the mean age in the dataset (e.g. “an increase of 1 year increases the expected bounce rate by X amount”). The intercept here can be interpreted as the average age.

# Standardize data
bounce_data <- bounce_data %>% 
  mutate(std_age = scale(age)[,1]) %>% 
  dplyr::relocate(std_age, .after=age)
  
# Example std_age data
summary(bounce_data)
##   bounce_time         age            std_age            county         
##  Min.   :156.0   Min.   : -1.00   Min.   :-2.22640   Length:613        
##  1st Qu.:190.3   1st Qu.: 32.00   1st Qu.:-0.75726   Class :character  
##  Median :202.6   Median : 48.00   Median :-0.04496   Mode  :character  
##  Mean   :200.0   Mean   : 49.01   Mean   : 0.00000                     
##  3rd Qu.:210.8   3rd Qu.: 66.00   3rd Qu.: 0.75639                     
##  Max.   :234.5   Max.   :129.00   Max.   : 3.56111

Complete Pooling (Simple Linear Regression)

A simple linear regression on this data would be considered a complete pooling scenario because we are grouping all data together and not considering the county grouping structure. We can fit this model with the lm functions.

complete_pool <- lm(bounce_time ~ std_age, 
                      data = bounce_data )

  
tidy(complete_pool)
## # A tibble: 2 x 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)   200.       0.570    351.   0.      
## 2 std_age         4.68     0.571      8.20 1.39e-15
ggplot(bounce_data, aes(x=std_age, y=bounce_time, color=county)) +
  geom_point(alpha=0.5) +
  geom_abline(aes(intercept=coef(complete_pool)[1], slope=coef(complete_pool)[2])) +
  labs(x="Standardized Age", y="Bounce time", title="Simple linear regression (population trendline)") 

Diagnostics

Our linear regression models runs on the assumptiont that 1) data is normally distributed and 2) variance is constant. So we can evaluate our model’s fit using a residuals vs fitted plot below. If the assumptions are met, we won’t see any trend in the residuals in relation to the predicted values (i.e. they won’t increase/decrease with predicted values). However, in our case we see there is a trend which violates this assumption and indicates a lack of fit.

# Plot diagnostic plot
qplot(x= .fitted, y= .resid, data = complete_pool, alpha=0.8) +
  geom_smooth(se = FALSE, size=1) +
  labs(y="Observed - Predicted", x= "Predicted", 
      title="Complete pooling model shows slight lack of fit (variance isn't constant)") +
  theme(legend.position = "none",
        title = element_text(size=10))

Evaluation

We also compute the RMSE for this model to set a baseline to compare subsequent models. Overall, this isn’t too bad.

# RMSE Helper function
rmse <- function(ytrue, ypred){
  mse = mean((ytrue - ypred)^2)
  return (sqrt(mse))
}

# Predict on train set
y_hat <- predict(complete_pool)

kable(tibble(RMSE= rmse(bounce_data$bounce_time, y_hat))) %>% 
  row_spec(0, background = "#4CAF50", color="#FFF") %>% 
  kable_styling(full_width = FALSE, position = "left")
RMSE
14.09131

No Pooling (Individual County Linear Regressions)

Next let’s explore the scenario where there is no pooling (i.e. we consider each county invdividually and fit a linear model to each) to see if this improves fit and performance. First, lets make sure this makes sense and that each county has can be considered an individual group.

# Check if there's variability across groups
ggplot(bounce_data, aes(x=county , y=bounce_time, fill=county)) +
  geom_boxplot(alpha =0.7) +
  labs(x="County", y="Bounce Time", 
       title="Bounce times variance and means seem to be different across county groups") +
  theme(legend.position = "none")

(Note: this model throws out a warning regarding a “singular fit” which will be relevant in subsequent section and the Bayesian section. It simply means the model specified is too complex to fit given the data. This can happen when we try to estimate multiple random effects (e.g. random intercept AND slope) for groups with small sample sizes)

# No pool model fit (i.e. no fixed effects)
no_pool <- lmList(bounce_time ~ std_age|county, data = bounce_data)

summary(no_pool)
## Call:
##   Model: bounce_time ~ std_age | NULL 
##    Data: bounce_data 
## 
## Coefficients:
##    (Intercept) 
##          Estimate Std. Error   t value      Pr(>|t|)
## cheshire 212.2158  0.7946394 267.05925  0.000000e+00
## cumbria  208.1807  0.9306684 223.68944  0.000000e+00
## devon    197.7474  1.7131615 115.42834  0.000000e+00
## dorset   200.3983  1.1399960 175.78859  0.000000e+00
## essex    207.4708  2.0042418 103.51584  0.000000e+00
## kent     219.7752  5.1018392  43.07764 2.480189e-185
## london   179.5314  2.7137768  66.15556 5.183271e-277
## norfolk  180.5582  1.1881604 151.96449  0.000000e+00
##    std_age 
##            Estimate Std. Error    t value   Pr(>|t|)
## cheshire  1.4704786  0.9574905  1.5357632 0.12512625
## cumbria   1.2194264  1.0406277  1.1718180 0.24173755
## devon     1.3343291  1.2628822  1.0565745 0.29113318
## dorset    2.1695580  1.1439588  1.8965351 0.05837178
## essex    -0.7750287  2.9945313 -0.2588147 0.79586742
## kent      0.6071768  4.1510922  0.1462692 0.88375829
## london    0.3313303  2.2795036  0.1453519 0.88448206
## norfolk   0.5509413  1.1939462  0.4614457 0.64464693
## 
## Residual standard error: 7.973223 on 597 degrees of freedom

Diagnostics

It seems that adding individual trendlines seems to improve the model’s fit given we only see a minor trends on the right side of the plot.

# Plot diagnostic plot
no_pool_df <- tibble(fit = fitted(no_pool),
                     resid = residuals(no_pool))

qplot(x= fit, y= resid, data = no_pool_df, alpha=0.8) +
  geom_smooth(se = FALSE, size=1) +
  labs(y="Observed - Predicted", x= "Predicted", 
      title="No pooling model shows better fit (variance isn't heteroscedastic)") +
  theme(legend.position = "none")

Evaluation

This model improves the predictive performance compared to the complete pool model, which is expected if fit a regression line for each group individually.

# Predict on train set
y_hat_np <- predict(no_pool)

# Get RMSEs to compare
np_preds <- tibble(rmse = "RMSE",
                   complete_pool =rmse(bounce_data$bounce_time, y_hat),
                   no_pool= rmse(bounce_data$bounce_time, y_hat_np)) %>% 
  column_to_rownames(., var="rmse")
               
# Table
kable(np_preds, digits = 3) %>% 
  row_spec(0, background = "#4CAF50", color="#FFF") %>% 
  kable_styling(full_width = FALSE, position = "left")
complete_pool no_pool
RMSE 14.091 7.868

This sounds appealing if you’re interested in predictive performance, however given we must be careful when it comes to interpretation and make sure it answers our original goal.

  1. First of all, some of the group estimates were computed with very small sizes (e.g. \(n = 7\)) which can lead higher variance and out-of-sample RMSE (e.g. a new data point / outlier might change the trendline direction completely).
  2. Second, are reducing the sample size to the sample sizes per groups which might lead to Type 1 error when performing multiple comparisons.
  3. Most importantly, the model doesn’t account for relationships or correlations between the groups and doesn’t help us answer the original question of how age affects bounce times across counties.
# Plot no pool OLS fits per county
ggplot(bounce_data, aes(x=std_age, y=bounce_time, color=county)) +
  geom_point(alpha=0.5) +
  stat_smooth(method = "lm", se = FALSE) +
  facet_wrap(~county) +
  labs(x="Age (standardize)", y="Bounce Time",
       title="Kent and Essex's estimated trendlines rely on very few data points") +
  theme(legend.position = "none", 
        title = element_text(size=10))

Complete pooling to no pooling

So how does this look like in terms of the estimated slope and intercept parameters? The animation below shows how the intercept and std_age estimates change when we consider all points as part of one group (complete pool) vs. as independent groups (no pool).

Here we see an example of Simpson’s paradox where the slope std_age is way larger higher when grouping isn’t considered compared to when grouping is considered.

# Build df for anitmation, first with no pool coefs
pool_to_nopool <- tibble(model="no_pool",
                         intercept= coef(no_pool)[,1],
                         slope = coef(no_pool)[,2])

# Add complete pool coefs (needs to add repeats)
tmp_df <- tibble(model="complete_pool",
                 intercept=rep(coef(complete_pool)[1], nrow(pool_to_nopool)),
                 slope = rep(coef(complete_pool)[2], nrow(pool_to_nopool)))

pool_to_nopool <- bind_rows(pool_to_nopool, tmp_df)

# Animate
animate(ggplot(pool_to_nopool, aes(x =intercept, y=slope)) +
          geom_point(aes(color=model, group=1L)) +
          scale_color_discrete(name="", 
                             labels=c("Complete Pooling", "No Pooling")) +
          labs(x=expression(paste(hat(beta[0]), " (Intercept)")),
               y=expression(paste(hat(beta[1]), " (Standardized Age)")),
               title="In the no pooling model each county has its own slope and intercept") +
          theme_bw()+
          theme(legend.position="bottom",
                title= element_text(size=7)) +
          transition_states(model, transition_length = 1, state_length = 1,
                            wrap=FALSE) +
          shadow_wake(wake_length = 0.1, alpha = FALSE),
  res=120, width=700)

Partial-Pooling (Hierarchical / Mixed-Effects Model)

Here’s were taking into account the nested (hierarchical structure) can help mitigate some of the issues listed above and also helps us answer the initial question.

Here I considered random intercept and slopes per county as part of this model. (Note: you could consider models with either of these and see if it performs better than including both).

In the output below, we can see that most of the between-county variance is explain by the random intercept.

# Fit mixed effects model with varying slope and intercept
lmm <- lmer(bounce_time ~ 1 + std_age + (1 + std_age|county), data=bounce_data)

summary(lmm)
## Linear mixed model fit by REML ['lmerMod']
## Formula: bounce_time ~ 1 + std_age + (1 + std_age | county)
##    Data: bounce_data
## 
## REML criterion at convergence: 4313
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.1235 -0.6105  0.0584  0.6195  3.3265 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev. Corr
##  county   (Intercept) 195.70539 13.9895      
##           std_age       0.06486  0.2547  1.00
##  Residual              62.98513  7.9363      
## Number of obs: 613, groups:  county, 8
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) 200.8092     4.9715  40.392
## std_age       1.3029     0.4783   2.724
## 
## Correlation of Fixed Effects:
##         (Intr)
## std_age 0.182 
## convergence code: 0
## boundary (singular) fit: see ?isSingular

Diagnostics

The fitted vs residual plots looks good with no trend, however it very similar to that of the no pooling model.

# Build diagnostic plot
partial_pool_df <- tibble(fit = fitted(lmm),
                     resid = residuals(lmm))

qplot(x= fit, y= resid, data = partial_pool_df, alpha=0.8) +
  geom_smooth(se = FALSE, size=1) +
  labs(y="Observed - Predicted", x= "Predicted", 
      title="No pooling model shows better fit (variance isn't heteroscedastic)") +
  theme(legend.position = "none")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Let’s compare how it compares in a Q-Q plot to check if the normality assumption is met. Here we expect sample to follow the theoretical quantiles of a normal (black line).

We see that the partial-pooling model (Mixed Effects) is slightly better in that it’s residuals are closer to the 1-1 line, especially on the tails compared to the no pool model.

# compare if partial pool and no pool residual distributions are normal
qq_df <- tibble(partial_resid = resid(lmm),
                no_pool_resid = resid(no_pool)) %>% 
  pivot_longer(cols = c("partial_resid", "no_pool_resid"),
              names_to="model", values_to="residuals")

ggplot(qq_df, aes(sample=residuals, color=model )) +
  stat_qq(alpha=0.5, size=3) + 
  stat_qq_line(color="black") +
  labs(x="Theoretical Quantiles", y="Sample Quantiles",
       title="Normal Q-Q plot: both models have nearly identical residual normal distributions") +
  theme(title=element_text(size=10))

Evaluation

The RMSE of the no pool is slightly smaller, so predictive performance doesn’t really improve with partial pooling in our scenario. However, we are now able to answer the initial questions which we weren’t able to answer with the no pool model.

# Generate predictions
lmm_yhat <- predict(lmm)

# Add partial pool RMSE
np_preds <- np_preds %>% 
  mutate(partial_pool = rmse(bounce_data$bounce_time, lmm_yhat))

# Compare to other models
kable(np_preds, digits = 3) %>% 
  row_spec(0, background = "#4CAF50", color="#FFF") %>% 
  kable_styling(full_width = FALSE, position = "left")
complete_pool no_pool partial_pool
14.091 7.868 7.878

No pooling to partial-pooling

So why is this happening and how does this look like?

It is happens because now we are assuming the groupings come from the same population instead of independent groups from distinct populations (no pool), and thus these share characteristics to some degree.

Here we can observe the shrinkage of estimates and standard errors (SE) for each of the random effects (i.e. intercept and slope per county).

However, it is important to note that fitting this model also led to singular fit warnings which indicate instability. (see shrinkage of std_age estimate tab for more info)

Note: it is important to remember that shrinkage might not be desired in some cases. For example, if our response is mortality rates in hospitals we might not want to “shrink” mortalities of smaller hospitals just because they have a smaller sample size. It might lead to erroneous interpretations.

Shrinkage std_age and intercept

Here we can observe how shrinkage changes between the complete pooling, no pooling and partial pooling models.

However, why do some county estimates exhibit larges changes (i.e. more displacement) between the no pool and partial pool models? (see next tab for answers)

pool_to_nopool <- bind_rows(pool_to_nopool,
                             tibble(model = 'partial_pool',
                                    intercept=coef(lmm)$county[,1],
                                    slope=coef(lmm)$county[,2]))

# Animate
animate(ggplot(pool_to_nopool, aes(x =intercept, y=slope)) +
          geom_point(aes(color=model, group=1L)) +
          scale_color_discrete(name="", 
                             labels=c("Complete Pooling", "No Pooling", 
                                      "Partial Pooling")) +
          labs(x=expression(paste(hat(beta[0]), " (Intercept)")),
               y=expression(paste(hat(beta[1]), " (Standardized Age)")),
               title="Effects on std_age and intercept estimates for various pooling models") +
          theme_bw()+
          theme(legend.position="bottom",
                title= element_text(size=7)) +
          transition_states(model, transition_length = 1, state_length = 1,
                            wrap=FALSE) +
          shadow_wake(wake_length = 0.2, alpha = FALSE),
  res=120, width=700)

Shrinkage of Intercept estimate

It happens because the partial pool parameter estimates for a particular group \(j\) are related to the group’s sample size \(n_j\), and the complete pool and unpooled estimates. For example, in a varying-intercept model the average bounce_time per county \(j\) would be a function of the complete pool \(\bar{y}_{all}\) and unpooled \(\bar{y_j}\) estimates through a weighted average. It also includes population level variance \(\sigma^2\) and group variances \(\tau^2\)

\[\hat{\alpha}_j \approx \frac{\frac{n_j}{\sigma^2}}{\frac{n_j}{\sigma^2} + \frac{1}{\tau^2}}(\bar{y}_j - \beta x) + \frac{ \frac{1}{\tau^2}}{\frac{n_j}{\sigma^2} + \frac{1}{\tau^2}}\bar{y}_{all}\]

Observe below how both the estimates and standard errors (SE) shrink when we fit the mixed effects model (partial pooling). The effect is stronger in groups with small sample sizes and less so in those with larger sample sizes.

This is why we say that estimates in hierarchical models “borrow strength”. Those with small sample sizes borrow more strength from those with larger samples (which makes sense because larger sample size -> more information -> more stable estimates)

# Collect all confidence intervals and estimates
ci_np <- confint(no_pool) # 95% confidence interval
ci_pp <- se.coef(lmm)$county

np_betas <- coef(no_pool) # estimates no pool
pp_betas <- coef(lmm)$county

# Prepare data for intercept shrinkage animation
county_n <- bounce_data %>% group_by(county) %>% count()
beta0_df <- tibble(model = "No Pool",
                  estimate = np_betas[,1],
                  lower = ci_np[,,"(Intercept)"][,1],
                  upper = ci_np[,,"(Intercept)"][,2])

beta0_pp <- tibble(model = "Partial Pool",
                 estimate = pp_betas[,1]) %>% 
  mutate(lower = estimate - 1.96*ci_pp[,1],
         upper = estimate + 1.96*ci_pp[,1])

beta0_df <- bind_rows(beta0_df, beta0_pp) %>% 
  mutate(n = rep(county_n$n, 2))


animate(ggplot(beta0_df, aes(x=n, y=estimate)) +
          geom_point(aes(color=model, group=1L), size= 2, alpha=0.4) +
          geom_errorbar(aes(ymin = lower, ymax = upper, color=model, group=1L),
                        width=6) +
          geom_hline(aes(yintercept =coef(complete_pool)[1]), 
                     color="black", alpha=0.4) +
          geom_text(aes(label ="Pop avg",
                         x= 10, y =coef(complete_pool)[1]-2),
                    size=3)+
          scale_color_discrete(name="", 
                             labels=c("No Pooling", "Partial Pooling")) +
          labs(y=expression(paste(hat(beta[0]), " (Intercept) Estimate")),
               x="Group sample size (n)",
               title="Partial-pooling srhinks no-pooling Intercept estimates and Std Errors") +
          theme_bw()+
          theme(legend.position="bottom",
                title= element_text(size=7)) +
          transition_states(model, transition_length = 0.7, state_length = 1,
                            wrap=FALSE),
        res=120, width=700, height=600)

Shrinkage of std_age estimate

Recall the singular fit message mentioned earlier in this document. The diagram below showcases the results of that warning which indicates a the model is too complex for the amount of data available in each group.

If we fit a model with no random slope we don’t get the singular fit error and our RMSE is nearly the same to the one including the random slope.

rintercept <- lmer(bounce_time ~ std_age + (1|county), data=bounce_data)

np_preds <- np_preds %>% 
  mutate(rintercept_only = rmse(bounce_data$bounce_time, 
                                predict(rintercept)))

# Table
kable(np_preds, digits = 3) %>% 
  row_spec(0, background = "#4CAF50", color="#FFF") %>% 
  kable_styling(full_width = FALSE, position = "left")
complete_pool no_pool partial_pool rintercept_only
14.091 7.868 7.878 7.88

This means that the random intercept estimate takes into account most of the between-county variability. Thus, in simple terms, when we attempt to estimate the random slope with the remaining variance from sample sampel sizes, the model can’t produce stable estimates using this remaining info. Therefore, we observe a slight shrinkage of the estimates but extreme for the standard errors.

Here is a situation where a Bayesian framework can assist and provide stable estimates.

beta1_df <- tibble(model = "No Pool",
                  estimate = np_betas[,2],
                  lower = ci_np[,,"std_age"][,1],
                  upper = ci_np[,,"std_age"][,2])

beta1_pp <- tibble(model = "Partial Pool",
                 estimate = pp_betas[,2]) %>% 
  mutate(lower = estimate - 1.96*ci_pp[,2],
         upper = estimate + 1.96*ci_pp[,2])

beta1_df <- bind_rows(beta1_df, beta1_pp) %>% 
  mutate(n = rep(county_n$n, 2))


animate(ggplot(beta1_df, aes(x=n, y=estimate)) +
          geom_point(aes(color=model, group=1L), size=2, alpha=0.4) +
          geom_errorbar(aes(ymin = lower, ymax = upper, color=model, group=1L),
                        width= 6) +
          geom_hline(aes(yintercept =coef(complete_pool)[2]), 
                     color="black", alpha=0.4) +
          geom_text(aes(label ="Pop avg",
                         x= 10, y =coef(complete_pool)[2]+0.5),
                    size=3)+
          scale_color_discrete(name="", 
                             labels=c("No Pooling", "Partial Pooling")) +
          labs(y=expression(paste(hat(beta[1]), " (std_age) Estimate")),
               x="Group sample size (n)",
               title="Partial-pooling shrinks no-pooling slope (std_age) estimates and Std Errors") +
          theme_bw()+
          theme(legend.position="bottom",
                title= element_text(size=7)) +
          transition_states(model, transition_length = 0.7, state_length = 1,
                            wrap=FALSE),
        res=120, width=700, height=600)

Bayesian Mixed-Effects Models

Here we will use the brms library which, given it uses a similar syntax as the lme4 functions above, it makes it very simple to implement a Bayesian hierarchical model using stan.

Selecting a prior

We quickly do some prior predictive checks with simulated data to choose some weakly informative priors

# Simulate 
y_sim <- rep(0,nrow(bounce_data))
for (i in 1:nrow(bounce_data)){
  sigma <- rhcauchy(1, sigma = 1)
  mu <- rnorm(1, mean=200, sd=100) + (rnorm(1, mean=1, sd=1)*bounce_data$std_age[i])
  y_sim[i] <- rnorm(1, mean=mu, sd=sigma)
}

tibble(bounce_time = bounce_data$bounce_time, sim_data = y_sim) %>% 
  ggplot(., aes(x=bounce_time, y=sim_data)) +
  geom_point() +
  labs(x="Empirical bounce times", y="Simulated data")

Modeling

We can now fit the brms model and obtain estimates for the county intercepts and slopes with associated 95% Credible Intervals. Note that the credible interval for the std_age variable are more stable instead of vanishing.

bayes_lmm <- brm(bounce_time ~ 1 + std_age + (1 + std_age|county), 
                 data=bounce_data,
                 family=gaussian(),
                 prior= c(prior(normal(200, 100), class = Intercept),
                          prior(normal(1, 1), class = b),
                          prior(cauchy(0, 1), class = sigma),
                          prior(cauchy(0, 1), class = sd)),
                 warmup = 2000, 
                 iter = 5000, 
                 chains = 2, control = list(adapt_delta = 0.95))
## Compiling the C++ model
## Start sampling
## 
## SAMPLING FOR MODEL '42646b01cf833a7a05c4f64cf38613aa' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.00017 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.7 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 5000 [  0%]  (Warmup)
## Chain 1: Iteration:  500 / 5000 [ 10%]  (Warmup)
## Chain 1: Iteration: 1000 / 5000 [ 20%]  (Warmup)
## Chain 1: Iteration: 1500 / 5000 [ 30%]  (Warmup)
## Chain 1: Iteration: 2000 / 5000 [ 40%]  (Warmup)
## Chain 1: Iteration: 2001 / 5000 [ 40%]  (Sampling)
## Chain 1: Iteration: 2500 / 5000 [ 50%]  (Sampling)
## Chain 1: Iteration: 3000 / 5000 [ 60%]  (Sampling)
## Chain 1: Iteration: 3500 / 5000 [ 70%]  (Sampling)
## Chain 1: Iteration: 4000 / 5000 [ 80%]  (Sampling)
## Chain 1: Iteration: 4500 / 5000 [ 90%]  (Sampling)
## Chain 1: Iteration: 5000 / 5000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 20.0046 seconds (Warm-up)
## Chain 1:                20.1954 seconds (Sampling)
## Chain 1:                40.1999 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL '42646b01cf833a7a05c4f64cf38613aa' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 6.3e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.63 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 5000 [  0%]  (Warmup)
## Chain 2: Iteration:  500 / 5000 [ 10%]  (Warmup)
## Chain 2: Iteration: 1000 / 5000 [ 20%]  (Warmup)
## Chain 2: Iteration: 1500 / 5000 [ 30%]  (Warmup)
## Chain 2: Iteration: 2000 / 5000 [ 40%]  (Warmup)
## Chain 2: Iteration: 2001 / 5000 [ 40%]  (Sampling)
## Chain 2: Iteration: 2500 / 5000 [ 50%]  (Sampling)
## Chain 2: Iteration: 3000 / 5000 [ 60%]  (Sampling)
## Chain 2: Iteration: 3500 / 5000 [ 70%]  (Sampling)
## Chain 2: Iteration: 4000 / 5000 [ 80%]  (Sampling)
## Chain 2: Iteration: 4500 / 5000 [ 90%]  (Sampling)
## Chain 2: Iteration: 5000 / 5000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 24.4384 seconds (Warm-up)
## Chain 2:                19.9037 seconds (Sampling)
## Chain 2:                44.3421 seconds (Total)
## Chain 2:
coef(bayes_lmm)
## $county
## , , Intercept
## 
##          Estimate Est.Error     Q2.5    Q97.5
## cheshire 212.1098 0.7126664 210.7458 213.5192
## cumbria  208.1223 0.8113930 206.5753 209.7085
## devon    197.8523 1.1180354 195.6907 200.0720
## dorset   200.9397 0.9230435 199.0733 202.7523
## essex    208.0806 1.7160049 204.7643 211.4593
## kent     218.2023 3.0515210 212.2322 224.1370
## london   180.5987 1.5357005 177.5877 183.5869
## norfolk  181.0615 0.9133083 179.2343 182.8198
## 
## , , std_age
## 
##          Estimate Est.Error       Q2.5    Q97.5
## cheshire 1.309158 0.5697589  0.2073636 2.466831
## cumbria  1.260816 0.5590676  0.1500384 2.363324
## devon    1.244711 0.5596847  0.1477928 2.348210
## dorset   1.348556 0.5564374  0.2989347 2.526872
## essex    1.229998 0.6385158 -0.1000667 2.433902
## kent     1.317830 0.7418647 -0.1596214 2.828374
## london   1.143667 0.7569504 -0.5398298 2.595089
## norfolk  1.122528 0.6868518 -0.3551716 2.433686

Diagnostics

Here we can make use of the bayesplot package

Posterior predictive distribution bounce_time

color_scheme_set("red")
ppc_dens_overlay(y = bounce_data$bounce_time,
                 yrep = posterior_predict(bayes_lmm, nsamples = 50)) +
  labs(x="Bounce time (secs)") +
  panel_bg(fill = "gray95", color = NA) +
  grid_lines(color = "white")

Posterior predictive checks per county

color_scheme_set("brightblue")
bayes_lmm %>%
  posterior_predict(nsamples = 500) %>%
  ppc_stat_grouped(y = bounce_data$bounce_time,
                   group = bounce_data$county,
                   stat = "mean") +
  labs(x= "Bounce times (ms)") +
  panel_bg(fill = "gray95", color = NA) +
  grid_lines(color = "white")

Posterior predictive checks: MCMC Divergence

color_scheme_set("darkgray")
mcmc_scatter(
  as.matrix(bayes_lmm),
  pars = c("sd_county__Intercept",
           "r_county[london,Intercept]"),
  np = nuts_params(bayes_lmm),
  np_style = scatter_style_np(div_color = "green", div_alpha = 0.8)
) +
  labs(x = "Standard Dev of County X Intercept",
       y= "Intercept of County",
       titles = "(No green dots means no divergence, thus good mixing and non-curvy posterior)")

Evaluation

Same performance, but let’s look at what happens to the std_age estimates and errors when we use a Bayesian framework

y_hat_bayes <- colMeans(posterior_predict(bayes_lmm,
                                         nsamples=1000))

# Add Bayes RMSE
np_preds <- np_preds %>% 
  mutate(bayes = rmse(bounce_data$bounce_time, y_hat_bayes))

# Compare to other models
kable(np_preds, digits = 3) %>% 
  row_spec(0, background = "#4CAF50", color="#FFF") %>% 
  kable_styling(full_width = FALSE, position = "left")
complete_pool no_pool partial_pool rintercept_only bayes
14.091 7.868 7.878 7.88 7.865

Shrinkage of std_age estimate (Bayes)

bayes_beta1 <- coef(bayes_lmm)$county[,,"std_age"] %>% 
  data.frame() %>% 
  dplyr::select(-Est.Error) %>% 
  mutate(model ="PP Bayes",
         n=county_n$n)

colnames(bayes_beta1) <- c("estimate", "lower", "upper",
                           "model", "n")

beta1_df <- bind_rows(beta1_df, bayes_beta1)


animate(ggplot(beta1_df, aes(x=n, y=estimate)) +
          geom_point(aes(color=model, group=1L), size=2, alpha=0.4) +
          geom_errorbar(aes(ymin = lower, ymax = upper, color=model, group=1L),
                        width= 6) +
           geom_hline(aes(yintercept =coef(complete_pool)[2]), 
                     color="black", alpha=0.4) +
          geom_text(aes(label ="Pop avg",
                         x= 10, y =coef(complete_pool)[2]+0.5),
                    size=3) +
          scale_color_discrete(name="", 
                             labels=c("No pool", "Freq PP", "Bayes PP")) +
          labs(y=expression(paste(hat(beta[1]), " (std_age) Estimate")),
               x="Group sample size (n)",
               title="Bayesian Partial-Pooling doesn't shrink Std Errors as much as the Frequentist PP model ") +
          theme_bw()+
          theme(legend.position="bottom",
                title= element_text(size=7)) +
          transition_states(model, transition_length = 0.8, state_length = 1.2,
                            wrap=FALSE),
        res=120, width=700, height=600)

anim_save("beta1_shrinkage_bayes.gif", path = "gif/")

Bayesian linear fits

Here we see that for counties with more samples the predicted linear trends are very close together (we are more certain of those ones). In comparison, the trends in counties with small sample show lots of uncertainty (i.e. larger estimation errors).

add_fitted_draws(model=bayes_lmm, newdata=bounce_data, n = 100) %>%        
  ggplot(aes(x = std_age, y = bounce_time, color=county) ) +
  geom_point() +
  geom_line(aes(y = .value, group = .draw), alpha = 1/15, color = "#08519C") +
  facet_wrap(~county, scales="free") +
  theme(legend.position = "none")